perf(ensv2): address label-heal and DB-lookup hotspots#1989
perf(ensv2): address label-heal and DB-lookup hotspots#1989
Conversation
The ensv2 plugin re-ran expensive label healing on every NewOwner / NameRegistered / NameRenewed event, including repeats for the same labelHash. For addr.reverse subnames this was especially costly — each repeat fired both eth_getTransactionReceipt and debug_traceTransaction against the ENS Root chain, even though the label had already been discovered. Gate the heal path on a cheap labelExists lookup at each call site (subgraph already had the equivalent guard via `domain.name === null`). ensureUnknownLabel no longer duplicates the existence check since all callers now check first. Bench on mainnet (10-min windows, ENSRainbow stubbed): pre: 1.54M events, 48k RPCs (23.9k traces) post: 2.20M events, 25k RPCs (12.3k traces) Per-event handler cost for ENSv1Registry:NewOwner dropped 57%, BaseRegistrar:NameRegistered dropped 62%.
ensureAccount is called from many handlers on every event, but the set of distinct accounts grows much more slowly than the event stream — the zero address, common controllers, and active registrants recur constantly. A process-local Set short-circuits repeat upserts. Safe because the insert is onConflictDoNothing: repeats are idempotent and the memo is purely an optimization. On process restart the Set resets, which just costs one redundant (idempotent) DB op the first time each account is seen again. Benchmark (average handler duration over 10M events on mainnet, ensv2,protocol-acceleration, ENSRainbow stubbed): no cache: 2193.0s wall, 4,560 eps, 40.83M DB ops cached: 2088.8s wall, 4,787 eps, 36.67M DB ops (-4.16M inserts) ensv2/ENSv1RegistryOld:NewOwner per-event cost dropped ~16% (0.178ms to 0.150ms); cache hit rate ~40%; wall-clock ~5% faster overall.
nodeIsMigrated is called as a precondition on every ENSv1RegistryOld event handler (NewOwner, Transfer, NewTTL, NewResolver) in the ensv2 plugin plus protocol-acceleration's own registry handler. At scale that is millions of PK lookups against migratedNode during a backfill. Cache both results (migrated + not-migrated) in process-local Sets. migrateNode updates both sets in lockstep so a cached "not migrated" answer is invalidated the instant a node transitions. Bidirectional caching is required because during historical backfill most domains are in the not-migrated state at read time (the migration event for them occurs later in the event stream), so a one-sided "migrated"-only cache wouldn't help the hot path. Safety: - migratedNode is append-only, so "migrated" cache entries never become stale. - Restart-safe: both sets repopulate from DB on cache miss. Benchmark (average handler duration over 1M events on mainnet, ensv2,protocol-acceleration, ENSRainbow stubbed): baseline: 255.8s wall, 3,909 eps, 1.31M find ops memo: 227.6s wall, 4,394 eps, 898k find ops (-31% finds) +12% throughput. protocol-acceleration/ENSv1RegistryOld:NewResolver per-event cost dropped ~20%, Resolver:AddrChanged dropped ~16%.
Emit a structured log line from eventHandlerPreconditions reporting cumulative events dispatched, elapsed time, and events-per-second at most once every 60s. Overhead is one Date.now() and a counter increment per event. Useful as a zero-setup throughput signal during perf work alongside the Ponder /metrics endpoint.
Local Prometheus + Grafana bundle for benchmarking ENSIndexer. Scrapes the indexer's /metrics endpoint (port 42069) every 5s and provisions a Grafana dashboard at /d/ensindexer tuned for perf work: top handlers by wall-clock share, p95 durations, events/sec, sync block + blocks/sec per chain, RPC req/s + latency, event-loop lag, Postgres queue size, and DB store queries/sec. Runs via docker compose; see the package README for usage.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 20 minutes and 19 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (12)
📝 WalkthroughWalkthroughThe PR implements in-memory memoization caches across multiple indexing functions to skip repeated database lookups within a single indexer run, adds cumulative events-per-second measurement and logging to the event handler hot path, and introduces a new performance testing infrastructure package with Prometheus and Grafana monitoring. Changes
Possibly Related PRs
Poem
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThree targeted perf fixes to the Confidence Score: 5/5Safe to merge; all remaining findings are P2 with no impact when ENSRainbow is healthy. The three core optimizations are well-reasoned and correctly implemented. The only substantive concern is that the apps/ensindexer/src/plugins/ensv2/handlers/ensv1/RegistrarController.ts and apps/ensindexer/src/lib/ensv2/label-db-helpers.ts Important Files Changed
Sequence DiagramsequenceDiagram
participant EV as ENSv1Registry:NewOwner
participant RC as RegistrarController:NameRegistered
participant LD as label-db-helpers
participant RB as ENSRainbow
participant DB as Database
EV->>LD: labelExists(labelHash)?
LD->>DB: find(label, labelHash)
DB-->>LD: null
LD-->>EV: false - call ensureUnknownLabel
EV->>RB: labelByLabelHash(labelHash)
alt ENSRainbow has label
RB-->>EV: plaintext label
EV->>DB: insert label (onConflictDoUpdate)
else ENSRainbow unavailable
RB-->>EV: null
EV->>DB: insert hash-only label (onConflictDoNothing)
end
RC->>LD: labelExists(labelHash)?
LD->>DB: find(label, labelHash)
DB-->>LD: existing entry
LD-->>RC: true - SKIP ensureLabel
Note over RC,DB: If hash-only was stored above, plaintext from event is never saved
Reviews (1): Last reviewed commit: "chore: add @ensnode/ensindexer-perf-test..." | Re-trigger Greptile |
Grafana loads any *.yml under provisioning/datasources/, but editors matching by filename try to validate prometheus.yml against the Prometheus config JSON schema, which has no `datasources` key, and surface a spurious "Property datasources is not allowed" warning. Renaming to datasources.yml (Grafana's own naming convention) dodges the schema collision.
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
apps/ensindexer/src/lib/ensv2/account-db-helpers.ts (1)
7-41: 🧹 Nitpick | 🔵 TrivialLGTM — memo is safe and the comment explains the invariants well.
The docstring correctly captures why this is safe (idempotent
onConflictDoNothingbacking store, restart-safe since re-inserts are no-ops). Ordering is also correct: the Set is mutated only after theawaitresolves, so a thrown insert won't leave the memo claiming success.One tiny optional nit:
Set<string>discards theAddressnominal type —new Set<Address>()would match the rest of the call site (interpretAddressreturnsAddress | null).Note: the memo grows monotonically with the number of distinct accounts seen during the indexer's lifetime. For ENS-scale backfills that's expected to stay in the low-millions range, which is fine for a
Set<string>, but worth keeping in mind if this module is ever reused in a longer-lived/larger-cardinality context.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ensindexer/src/lib/ensv2/account-db-helpers.ts` around lines 7 - 41, The memo Set is typed as Set<string> which discards the Address nominal type; change its declaration to use Set<Address> so the type matches interpretAddress's return and the rest of the call sites—update the ensuredAccounts declaration (symbol: ensuredAccounts) and ensureAccount references if needed, keeping interpretAddress and ensureAccount behavior unchanged.apps/ensindexer/src/lib/protocol-acceleration/registry-migration-status.ts (1)
11-48: 🧹 Nitpick | 🔵 TrivialConsider bounding
nonMigratedNodescache in steady-state indexing.The append-only invariant on
migratedNode(asserted in the docstring) holds in practice — there are no delete operations on this table anywhere in the codebase, so cached "migrated=true" entries remain correct for the process lifetime.The
nonMigratedNodesset, however, grows unbounded. Each ENSv1RegistryOld event on a not-yet-migrated node adds an entry, and the vast majority of v1 nodes are never migrated. Over a full backfill this approaches the distinct v1 nodes touched (potentially millions of ~66-byte hex keys plusSetoverhead). Given the measured +12% performance win this is acceptable, but consider documenting the expected steady-state size or swapping for an LRU (e.g.,lru-cache) to bound memory — the "migrated" side can stay unbounded since it's append-only and substantially smaller.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ensindexer/src/lib/protocol-acceleration/registry-migration-status.ts` around lines 11 - 48, nonMigratedNodes is unbounded and can grow huge during backfills; replace the plain Set nonMigratedNodes used by nodeIsMigrated with a bounded LRU (e.g., using lru-cache) or similar size-limited data structure to cap memory, leaving migratedNodes as the unbounded append-only Set; specifically, import and instantiate an LRU (or wrap a Map) named nonMigratedNodes with a configured max size/TTL, then update usages in nodeIsMigrated (use nonMigratedNodes.has(node) → nonMigratedNodes.get(node) or nonMigratedNodes.has depending on API, and replace nonMigratedNodes.add(node) with nonMigratedNodes.set(node, true)) and ensure migrateNode still removes entries from nonMigratedNodes if it references that symbol so the cache invalidation stays correct.apps/ensindexer/src/plugins/ensv2/handlers/ensv1/RegistrarController.ts (1)
104-107: 🧹 Nitpick | 🔵 TrivialCopy-paste:
NameRenewedinvariant error message saysNameRegistered.Pre-existing, but adjacent to the changes — the throw in
handleNameRenewedByControlleris labeledInvariant(RegistrarController:NameRegistered). Worth fixing while you're here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ensindexer/src/plugins/ensv2/handlers/ensv1/RegistrarController.ts` around lines 104 - 107, The thrown invariant in handleNameRenewedByController incorrectly labels the error as "Invariant(RegistrarController:NameRegistered)"; update the thrown Error message in the branch that checks label vs labelHash (the conditional using label !== undefined && labelHash !== labelhashLiteralLabel(label)) to reference "Invariant(RegistrarController:NameRenewed)" (and keep the rest of the message content identical) so the error correctly reflects handleNameRenewedByController.apps/ensindexer/src/lib/indexing-engines/ponder.ts (1)
215-221: 🧹 Nitpick | 🔵 TrivialStale JSDoc:
eventHandlerPreconditionsis no longer “executes its logic once”.The docstring still says the function "is idempotent and will only execute its logic once", but
recordEventForEps()now runs on every call. Consider updating the comment to reflect that EPS accounting runs on every invocation while the setup/activation paths remain idempotent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/ensindexer/src/lib/indexing-engines/ponder.ts` around lines 215 - 221, Update the stale JSDoc for eventHandlerPreconditions to accurately describe its behavior: state that recordEventForEps() is executed on every invocation (so EPS accounting runs per event), while any setup/activation logic inside eventHandlerPreconditions remains idempotent and only runs once. Remove the phrase "will only execute its logic once" and explicitly document which parts are per-call (recordEventForEps) and which parts are idempotent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/ensindexer/src/lib/ensv2/label-db-helpers.ts`:
- Around line 35-37: Update the doc comment for the function that currently
tells callers to guard with ensureLabel; change the guidance to point callers to
check labelExists(...) === false instead (i.e., use labelExists to decide
whether to call ensureLabel to avoid duplicate ENSRainbow healing). Mention the
exact symbol labelExists and the expected boolean check (labelExists(...) ===
false) so the heal-dedup contract is clear.
In `@apps/ensindexer/src/lib/indexing-engines/ponder.ts`:
- Around line 190-206: EPS logging only happens when recordEventForEps() is
called, so throughput is never emitted if ingestion stalls; add a periodic flush
to emit the EPS log even during idle periods. Implement a background timer
(e.g., via setInterval) started alongside the indexing engine that calls a new
helper (or recordEventForEps with a flag) to compute and log the same metrics
using epsStartTime, epsTotalEvents, epsLastLogTime and EPS_LOG_INTERVAL_MS when
the interval has elapsed; ensure the timer is cleared on shutdown to avoid leaks
and document that it keeps the process alive while running.
In `@apps/ensindexer/src/plugins/ensv2/handlers/ensv1/ENSv1Registry.ts`:
- Around line 61-81: The code currently skips addr.reverse healing when
labelExists(context, labelHash) is true because ensureUnknownLabel() inserts
with onConflictDoNothing(), preventing later onConflictDoUpdate()-style upgrades
by ensureLabel(); add a clear comment above the labelExists check that explains
this behavior, referencing labelExists, ensureUnknownLabel, ensureLabel,
healAddrReverseSubnameLabel, onConflictDoNothing/onConflictDoUpdate, and the
ADDR_REVERSE_NODE/getENSRootChainId check, and state whether this trade-off is
intentional (or, if not intended, instruct to change ensureUnknownLabel to allow
upgrades or alter the conditional to attempt healing for addr.reverse even if a
placeholder row exists).
In `@apps/ensindexer/src/plugins/ensv2/handlers/ensv1/RegistrarController.ts`:
- Line 67: The inline comment text is wrong: replace the comment "ensure label
if exists" with "ensure label if not exists" at both occurrences inside
RegistrarController (look for the two comment lines near the logic that
creates/ensures a label, around the block in the RegistrarController class
handling label creation—one around where the code checks for missing label at
the first occurrence and the second near lines 110–111 where the duplicate
comment appears) so the comment correctly describes that the block runs only
when the label does not exist.
- Around line 67-75: The gating around labelExists prevents upgrading an
"unknown" row to a healed one because a prior ensureUnknownLabel can make
labelExists true and skip ensureLabel; update the logic in RegistrarController
(the block that calls labelExists, ensureLabel, and ensureUnknownLabel) to
always call ensureLabel(context, label) when label !== undefined (even if
labelExists returns true) so that the onConflictDoUpdate({ interpreted }) path
in label-db-helpers runs and upgrades interpreted; keep calling
ensureUnknownLabel(context, labelHash) only when label is undefined and the row
truly doesn't exist. Ensure you reference the functions labelExists,
ensureLabel, ensureUnknownLabel and the event path (handleNewOwner /
NameRegistered) while making this change.
In `@packages/ensindexer-perf-testing/grafana/dashboards/ponder.json`:
- Line 3: The dashboard UID in ponder.json ("uid": "ponder-ensindexer") does not
match the documented URL (/d/ensindexer); update the UID value in ponder.json to
"ensindexer" (replace "ponder-ensindexer" with "ensindexer") so the provisioned
Grafana dashboard matches the README link, and if you prefer to keep the current
UID instead, update the README/documented URL to point to /d/ponder-ensindexer
to keep them consistent.
- Around line 47-80: Panels are overlapping: update the gridPos.y values to
remove the overlap by moving panel id 12 ("Total events per handler") from y=28
to y=30, then cascade the subsequent panels as requested—set panels id 5 and 6
to y=40, panels id 7 and 8 to y=48, and panels id 9, 10, and 11 to y=56—so each
panel's gridPos.y is adjusted to the new values to eliminate row collisions.
In `@packages/ensindexer-perf-testing/package.json`:
- Line 11: The "wipe" npm script in package.json uses curl with the -s flag
which suppresses errors and causes the command to silently no-op if Prometheus
or its admin API isn't available; update the "wipe" script (referenced as the
"wipe" npm script) to remove the -s option or add --fail-with-body / explicit
failure handling so curl surfaces errors (or append a fallback that prints a
clear error when curl exits non-zero) ensuring operators see a useful failure
instead of silent no-op.
---
Outside diff comments:
In `@apps/ensindexer/src/lib/ensv2/account-db-helpers.ts`:
- Around line 7-41: The memo Set is typed as Set<string> which discards the
Address nominal type; change its declaration to use Set<Address> so the type
matches interpretAddress's return and the rest of the call sites—update the
ensuredAccounts declaration (symbol: ensuredAccounts) and ensureAccount
references if needed, keeping interpretAddress and ensureAccount behavior
unchanged.
In `@apps/ensindexer/src/lib/indexing-engines/ponder.ts`:
- Around line 215-221: Update the stale JSDoc for eventHandlerPreconditions to
accurately describe its behavior: state that recordEventForEps() is executed on
every invocation (so EPS accounting runs per event), while any setup/activation
logic inside eventHandlerPreconditions remains idempotent and only runs once.
Remove the phrase "will only execute its logic once" and explicitly document
which parts are per-call (recordEventForEps) and which parts are idempotent.
In `@apps/ensindexer/src/lib/protocol-acceleration/registry-migration-status.ts`:
- Around line 11-48: nonMigratedNodes is unbounded and can grow huge during
backfills; replace the plain Set nonMigratedNodes used by nodeIsMigrated with a
bounded LRU (e.g., using lru-cache) or similar size-limited data structure to
cap memory, leaving migratedNodes as the unbounded append-only Set;
specifically, import and instantiate an LRU (or wrap a Map) named
nonMigratedNodes with a configured max size/TTL, then update usages in
nodeIsMigrated (use nonMigratedNodes.has(node) → nonMigratedNodes.get(node) or
nonMigratedNodes.has depending on API, and replace nonMigratedNodes.add(node)
with nonMigratedNodes.set(node, true)) and ensure migrateNode still removes
entries from nonMigratedNodes if it references that symbol so the cache
invalidation stays correct.
In `@apps/ensindexer/src/plugins/ensv2/handlers/ensv1/RegistrarController.ts`:
- Around line 104-107: The thrown invariant in handleNameRenewedByController
incorrectly labels the error as "Invariant(RegistrarController:NameRegistered)";
update the thrown Error message in the branch that checks label vs labelHash
(the conditional using label !== undefined && labelHash !==
labelhashLiteralLabel(label)) to reference
"Invariant(RegistrarController:NameRenewed)" (and keep the rest of the message
content identical) so the error correctly reflects
handleNameRenewedByController.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bd7b91e6-1e10-4b23-9430-340236697bc5
📒 Files selected for processing (13)
apps/ensindexer/src/lib/ensv2/account-db-helpers.tsapps/ensindexer/src/lib/ensv2/label-db-helpers.tsapps/ensindexer/src/lib/indexing-engines/ponder.tsapps/ensindexer/src/lib/protocol-acceleration/registry-migration-status.tsapps/ensindexer/src/plugins/ensv2/handlers/ensv1/ENSv1Registry.tsapps/ensindexer/src/plugins/ensv2/handlers/ensv1/RegistrarController.tspackages/ensindexer-perf-testing/README.mdpackages/ensindexer-perf-testing/docker-compose.ymlpackages/ensindexer-perf-testing/grafana/dashboards/ponder.jsonpackages/ensindexer-perf-testing/grafana/provisioning/dashboards/dashboards.ymlpackages/ensindexer-perf-testing/grafana/provisioning/datasources/prometheus.ymlpackages/ensindexer-perf-testing/package.jsonpackages/ensindexer-perf-testing/prometheus.yml
Per @greptile-apps review on #1989: the labelExists gate added in ef8a9c3 suppresses the onConflictDoUpdate upgrade path when ENSRainbow is unavailable. Flow: ENSv1Registry:NewOwner fires first and, if ENSRainbow can't heal, ensureUnknownLabel writes a hash-only row. When RegistrarController:NameRegistered subsequently arrives with the emitted plaintext `label`, the gate short-circuits the ensureLabel call — the hash-only row is never upgraded. Restructure both handleNameRegisteredByController and handleNameRenewedByController so that: - if plaintext `label` is emitted, always call ensureLabel (upgrade path via onConflictDoUpdate is essential) - otherwise, gate ensureUnknownLabel on !labelExists Also fixes the {@link} reference in ensureUnknownLabel's JSDoc to point at labelExists instead of ensureLabel (flagged by the same review).
The EPS logging added in 1895a9b made ponder.ts import the app logger, which in turn pulls in local-ponder-context and asserts the PONDER_COMMON runtime global. Tests don't set that global, so every test in ponder.test.ts failed at import time. Mock @/lib/logger alongside the existing vi.mock entries; unit tests pass again (all 18 ensindexer files, 201 tests).
The docstring claimed "will only execute its logic once" which was accurate for the Setup/Onchain preconditions but not for the EPS accounting call added alongside. Clarify that the memoization applies to the preconditions while EPS accounting runs per-event with minimal (Date.now + increment) overhead. Addresses PR #1989 review comments from copilot and vercel.
50cd494 to
88edd41
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…gle Map Replaces the two Sets (migratedNodes + nonMigratedNodes) with a single Map<Node, boolean>. One structure, one lookup, no risk of the two sets drifting out of sync — migrateNode is just set(node, true), no companion delete needed. Semantics and restart behavior are unchanged.
Apply the same structure in both helpers: set the cache entry before the DB write and short-circuit eagerly when the cache already knows the answer, so repeat calls skip the DB round-trip entirely. Use `=== true` in migrateNode for explicit handling of the three-state Map<Node, boolean> (skip only on confirmed-migrated).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 16 changed files in this pull request and generated no new comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
summary
three perf fixes to the ensv2 plugin, found by profiling an
ensv2,protocol-accelerationbackfill against mainnet through a local Prometheus + Grafana setup (included as a new@ensnode/ensindexer-perf-testingpackage).ef8a9c32) —handleNewOwnerpreviously re-ranhealAddrReverseSubnameLabelon every NewOwner for an addr.reverse subname, each invocation doingeth_getTransactionReceipt+debug_traceTransaction. gated on a cheaplabelExistslookup — mirrors thedomain.name === nullguard the subgraph plugin already uses. per-event cost dropped 57% onENSv1Registry:NewOwnerand 62% onBaseRegistrar:NameRegistered; total indexer throughput +43% at the 10-min mark with rainbow stubbed.16abf3d9) — process-local Set short-circuits repeat account upserts. 40% cache hit rate over a 10M-event benchmark; wall-clock -5%.e6523a1d) — process-local bidirectional cache for a hot PK lookup called on every ENSv1RegistryOld event. over 1M events: wall-clock -11%, find ops -31%.the remaining
v1Domainwrite-collapse inhandleNewOwner/handleTransfer(the next-largest hotspot) is tracked in #1983 and will land separately.tooling
packages/ensindexer-perf-testing/— docker-compose bundle of Prometheus + Grafana with a pre-provisioned dashboard at/d/ensindexertuned for indexer perf work (top handlers by wall-clock share, p95 durations, RPC req/s, event-loop lag, etc). see the package README for usage.apps/ensindexer—eventHandlerPreconditionsnow emits a structuredIndexing throughputlog line every ~60s (cumulative events, elapsed, eps). one `Date.now()` per event; zero measurable overhead.